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to extract various interactions of given data set. In this note, we show that 
Efron's statistical curvature of the structural gradient model is less than that 
of a competitive mixture model under a null hypothesis. 
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1 Introduction 



Exponential families are important in statistical modeling. For example, the Gaus- 
sian family and its subfamilies are often used in multivariate analysis, time-series 
analysis, geostatistics and any other areas that deal with quantitative data. Using 
the exponential family is reasonable because it is derived from the maximum entropy 



OO 

. criterion (see e.g. Cover and Thomas (2006)). It is also compatible with regression 



problem, that is, the generalized linear models (McCullagh and Nelder (1989)). A 
comprehensive book on exponential families is Barndorff-Nielsen (1978). 

A drawback of exponential families is that the probability density function is 
sometimes not explicitly expressed due to the normalizing constant. For example, if 
one would try to find three-dimensional interaction of given data, a corresponding 
exponential family is not available in explicit form. Although the Markov Chain 
Monte Carlo procedure is available, it requires some adjustment for convergence. 
As an attempt to overcome the difficulty, Sei (2010) suggested a new parametric 
family called a structural gradient model (SGM) for multivariate quantitative data. 
SGM is numerically shown to have a desirable performance for such a purpose. 
However, it is not known whether SGM is close to an exponential family or not. In 
this paper, we give a partial answer to this problem. 

A measure of closeness to an exponential family is Efron's statistical curvature 
7 2 , refered to the Efron curvature below. It is defined in terms of the second-order 
derivative of the log-likelihood function. See Section 2 for the precise definition. 
Efron (1975) showed that information loss of the maximum likelihood estimator is 
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asymptotically expanded as 7 2 + o(l) if the sample size N goes to infinity. It is 
known that 7 2 vanishes if the model is an exponential family. Furthermore, r j 2 is an 
intrinsic quantity, that is, independent of the parameterization of the model. 

Consider two statistical models M\ and M 2 , and assume that they have a common 
density p and a common score vector at p . The Fisher information matrix at p is 
common in both models. Then we can say that, without subjectivity, the model Mi 
is closer to exponential family at p than M 2 if the Efron curvature of Mi is smaller 
than M 2 . 

We compare the Efron curvature of SGM and MixM, which is a competitive model 
with SGM. MixM is an abbreviation of the structural mixture model. Here we briefly 
describe SGM and MixM. For details, refer to Section 3 and Sei (2010). SGM is a 
statistical model on hypercube represented by Fourier-expanded optimal transport 
between the target density and the uniform density. Here the Fourier coefficients 
are the unknown parameter. The model is related to the optimal transport theory. 
See Villani (2003) and Villani (2009) for the optimal transport theory. MixM is 
represented by Fourier expansion of the probability density function itself. Both 
SGM and MixM do not need computation of normalizing constants, in contrast to 
the exponential family. We show that the curvature of SGM is less than MixM 
under the common null hypothesis. In other words, SGM is closer to exponential 
family than MixM. This motivates to use SGM rather than MixM for analyzing 
complicated dependency of given data. 

The paper is organized as follows. We recall the definition of the Efron curvature 
in Section 2 and define SGM and MixM in Section 3. Then we state the main result 
of this paper in Section 4. We give some discussion in Section 5. Proofs are given 
in Section 6. 

2 Efron's statistical curvature 

We recall the Efron curvature of a general statistical model according to Efron 
(1975), Reeds (1975) and Amari (1985). Intuitively, the Efron curvature is the 
residual when the second derivative of the log-likelihood is projected onto the linear 
space spanned by the score functions and the constant function. 

Consider a parametric family of density functions p(x\9) with respect to a base 
measure da; indexed by a parameter vector 6 = {6 u ) ue u, where U is a finite set. 
Typically U = {1, . . . , d} with some d > 1, but we will consider other case in the 
next section. The parameter space G of 9 is an open subset of 1R W , where M w denotes 
the set of all real vectors {6 u ) u& u indexed by IA. Without loss of generality, we assume 
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G 6 and define the curvature at 9 — 0. 

Denote the first and second derivative of the log-likelihood function by 

d 

L u = L u (x) = — log p(x\6) 
d 2 

Luv = L uv (x) = \ogp(x\d) 

0V u ov v e=0 

for u,v 6W. Define the Fisher information (J uv ) u ,veu an d the e-connection coeffi- 
cients {J^ uv,w)u,v,weU 

and (r™ v ) UjVjWeU by 

■Juv = / p(a;|0)L u L t ,dx, r uV)tu = / p(x\0)L uv L w dx, T™ v = ^^T UV ^ S J SW , 
J J seu 

where (J sw ) is the inverse matrix of (J sw ). We define a fourth-order tensor by 



Quv,wz — / p(x\0) j L u „ + J uv — r*^L s J J L wz + — T t wz L t J 
^ V seu J \ teu / 



dx. 



Finally, we define the Efron curvature by 

T ^ ^ Quv,wzJ J ■ (1) 

The Efron curvature is a non-negative scalar quantity independent of parameteriza- 
tion of p(x\9). 

The Efron curvature is related to the exponential family and information loss 
as stated in Section 1. Precise statements are as follows. Recall that a statistical 
model p(x\9) is called an exponential family (in canonical form) if it is written 
as p(x\9) = exp(J2ueu@utu(x) — i ) (9)) with the sufficient statistics t u (x) and the 
normalizing function ip(9). 

Lemma 1. Let be an open subset of M. u . Then the Efron curvature vanishes over 
O if and only if p(x\9) is an exponential family. 

Lemma 2. Let (xi, . . . , xjv) be an i.i.d. sample from a density p(x\9). Then, under 
some regularity conditions, the information loss of the maximum likelihood estimator 
9 n is asymptotically 

t(xi,...,xjv) _ jdK — \^ r) 7«> z _i_ n m 

°uv uv — / j y XUW,VZ' J ' V / 

w,z 

as iV — > oo, where Jj„ denotes the Fisher information matrix of a statistic T. Note 
that J^ 1 ''"' 31 ^ = NJ UV . In particular, averaged information loss is given by 

J UV (Jt'-' XN) - Jt) = 7 2 + o(l). 

u,veu 

For the proof, refer to Efron (1975), Reeds (1975) and Amari (1985). 
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3 SGM and MixM 



We prepare some notations to define SGM and MixM. Let m be a positive integer. 
Denote the gradient operator and Hessian operator on W 71 by D = (d/dxi)™ =l and 
D 2 = (<9 2 / dxidxj)™j =1 , respectively. The determinant and trace of a square matrix 
A are denoted by det A and trA, respectively. For square matrices A and B, if 
A — B is non-negative definite, we write A y B. Let Z and Z>o be the set of all 
integers and all non-negative integers, respectively. Let (Z> ) + = Z> \ {0} be the 
set of all m-dimensional non-negative integer vectors except for zero vector. Define 
||w|| = (Xljli u 2 ) 1 ^ 2 f° r u £ The vectors are considered as column vectors unless 
otherwise stated. 

We give the definition of SGM and MixM. Examples are given later. 

Definition 1 (SGM). Let U be a finite subset of (Z™ ) + . The structural gradient 
model (SGM) is a set of probability densities on the hypercube [0, l] m with parameter 
vector 9 = (9 U ) G R u defined by 

1 9 m 

p^ m \x\9) = det(D 2 ^(x\e)), t/;(x\9) = -x T x - "HI cos ( nu j x j)- ( 2 ) 

ueu 71 j=i 

The parameter vector 9 is said to be feasible if D 2 ijj(x\9) >z for every x £ [0, l] m . 
Definition 2 (MixM). Under the same notation as SGM, define 

m 

p^ ix \x\9) = l + Y,^u\\u\\ 2 Y[cos{7Tu j x j ). (3) 

ueu 3=1 

The set of j9 (mix) (x|#) is called MixM in this paper. The parameter vector 9 is feasible 
if p( mix \x\9) > for all x E [0, l] m . 

Remark that both p( sgm )(:r|0 = 0) and p( mix \x\9 = 0) are the uniform density. 
Define a matrix H u (x) by 

H u (x) := D 2 ^-tt -2 ]^ cos(Tra^) j . (4) 

In particular, 
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tr H u (x) = ll^ll 2 cos(7raj-a:j). 

3=1 

Then we can rewrite (2) and (3) as 

p^)(x\9) =det (i + Y J e uH u (x)\ , P (mix) (x\9) = 1 + 9 u trH u (x). (5) 
V u&A / ueu 
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We state a fundamental lemma. For completeness, we prove it in Section 6. We 
denote the indicator function of a set A by 1a- 

Lemma 3 (Sei (2010) Lemma 3). The score vector at 9 = of both SGM and 
MixM is (tr H u (x)) ue u- The common Fisher information matrix J = (J uv ) u ,veu at 
9 = is J uv = ||-u|| 4 2^l <T ^ll{ u=1 ,}, where a(u) = {j G {1, . . . , m} | Uj > 0} and |cr(it) j 
denotes the cardinality of a(u). In particular, J uv is diagonal. 

We give a few examples, where we write (u±, . . . , u m ) instead of (u±, . . . , u m ) T for 
simplicity. 

Example 1. Let m = 2 and U = {(1, 1)}. We abbreviate as 9 for simplicity. 
Then we have 

(s g m)/ ,m = det A + cos(7rai) cos(7rx 2 ) -9 sin(vra;i) sin(vrx 2 ) \ 
y — 9 sin(7rxi) sin(7TX2) 1 + 9 cos(7ra; 1 ) cos(7TX2)y 

= 1 + 2^ cos(7rx 1 ) cos(7ra;2) + ^ 2 {cos 2 (7ra; 1 ) + cos 2 (7ra;2) — 1} 

anc l p(™*)( x \9) = 1 + 2^cos(7rx 1 ) cos(7rx 2 ). SGM is feasible if and only if |0| < 1. 
MixM is feasible if and only if \9\ < 1/2. 

Example 2. Let m = 3 and U = {(1, 0, 0), (2, 0, 0), (1, 1, 0), (2, 1, 0), (1, 1, 1)}. Then 
the diagonal part J u := J uu of the Fisher information matrix is 

T - 1 T -87 -1 7 - 25 7 - 9 

J (1,0,0) - 2' J (2,0,0) - o, ^(1,1,0) - -L, J (2,l,0) - J (l,l,l) - g" 

4 Main result 

Consider a finite subset W of (Z^ )+. Let (7^) (sgm) and (7^) (mix) be the Efron 
curvature (1) of SGM and MixM at 9 = 0, respectively. For each % e {1, . . . , m}, we 
set Zj = {a G (Z^ ) + 1^ = if j ^ i}. 
Our main result is the following theorem. 

Theorem 4. For any finite U C (Z™ ) + , the following inequality holds: 

< ( 7 2 ) (sgm) < (7 2 )(m ix)_ (6) 

Equality holds if and only if there is some % G {1, . . . , m) such that U C Zj. If the 
equality holds, then the two models coincide. 

We give more explicit expression of the two quantities. We prepare some addi- 
tional notations. For a vector U = (Ui) G Z m , its component-wise absolute value 
is denoted by abs(TJ) = (\Ui\). For two vectors U = (Ui) and V = (Vi), their 
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component-wise product (Hadamard product) is denoted by U o V — (UiVj). Let 
f3 = (/3i) E {— l,l} m be a Bernoulli sequence, that is, independently takes the 
value ±1 with probability 1/2 each. For a Bernoulli sequence (5 and a vector u <EU 
we call the vector U — j3 o u Bernoulli randomization of u. The expectation with 
respect to U (inherited from (5) is denoted as E^. If Bernoulli randomization of two 
or more (possibly the same) vectors are considered, then they are assumed to be 
independent. Recall that ||u|| = (X]j=i u 2 ) 1 ^ 2 and a(u) = {j | Uj > 0}. 

The explicit expression of the Efron curvature is given in the following theorem. 
The inequality (6) is obtained as a corollary. 



Theorem 5. The Efron curvature of SGM and MixM at 6 = is given by 



2\(sgm) 



(->£) 



(7^) (mix) 



E u,v,u 

t,v£U 



V 



u,v,u,v 



u,v£U 



u u (U,V,U,Vp 
cu u (U,V,U, V)2^+^ v 



MI 4 IM| 4 



(7) 
(8) 



where U, V, U, V are Bernoulli randomization of u, v, u, v, respectively, and 

Ult(U,V,U, V) = l{ U+v+ jj + y =0j abs(C/+V)^WU{0}}- 

In particular, (7^) ( - sgm - ) and (7^) ( - mix - ) are rational numbers. 

Table 1 shows the Efron curvature for several specific cases of U. Let 
0-{jfr})jLi, the i-th unit vector. 



Table 1: The Efron curvature for several cases of U. 



u 


( 7 2 ) (sgm) 


^2) (mix) 


{/ e i}l<f<d,l<i<m 
{&i ~4~ &j}l<i<j<m 

{e t + e^jZf 
{ei + ei}? =2 


2-' 2 d(d+ l)m 
2~ 5 m(m- l)(m + 2) 

2- 4 (7m - 10) 
2~ 5 (m- l)(3m + 2) 


2~ 2 d(d + l)m + ci 2 m(m - 1) 
2~ 3 m(m - l)(2m 2 - 6m + 9) 
2~ 2 (4m 2 - 3m - 5) 
2- 2 (m-l)(6m-7) 



We end with an asymptotic property. For the first three examples in Table 1, it 
is easily confirmed that (7^) <sgm V(7w) ( ' miX ' ) converges to as m — > oo. This property 
holds in a more general setting. We define two sets M(U) and N(U) by 

M{U) = {(u,v) eU 2 | u + v £U), 
N{U) = { (u, v)eU 2 \ a(u) n <t(v) ± 0} . 

We denote cardinality of a set A by \A\. 
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Theorem 6. Let U m be a finite subset of (Z> ) + for each m £ {1,2,...}. Assume 
that max u6Mm \cr(u)\ is bounded over m. Further assume \N(U m )\/\M(U m )\ — > as 
m -> oo. Then (7wJ (sgm 7(7wJ (mix) ~* as m ~> 00 • 

Let be the set of maximal elements of U, that is, 

fi(U) = {u eU \ \/v eU\ {u}, 3i £ {1, . . . , m} s.t. v { < ui} . 

Corollary 7. Let U m be a finite subset of (Z> ) + for each m £ {1, 2, . . .}. Assume 
that max ueMm is bounded over m. Further assume \N{U m )\/\ji{U m )\ 2 — > as 

m ^ oo. Then (7^ m ) (sgm) /(7w m ) (mix) ^ as m ~> 00 • 

Table 2 shows the numbers \N(U)\ and \^{U)\ for the examples in Table I. It 
is consistent with Corollary 7, that is, \N(U)\/\fi(U)\ 2 — >■ only for the first three 
cases. 

Table 2: The numbers |JV(W)| and \n(U)\. 



u 


\N{U)\ 




{fZi}l<f<d,l<i<m 
&j}l<i<j<m 

{e t + e w }? = f 
{ei + e l }T=2 


d 2 m 

2- 1 m(m- l)(2m-3) 
3m — 5 
(m- l) 2 


m 

2~ 1 m(m — 1) 
m — 1 
m — 1 



5 Discussion 

We evaluated the Efron curvature of SGM and MixM (Theorem 5) and used it to 
show that SGM has smaller curvature than MixM (Theorem 4). Here we give some 
unsolved problems. 

In Table 1, we listed explicit formulas of the Efron curvature for specific W's by 
using (7) and (8). It is challenging to derive formulas for more practical sets, such 
as 

m 

U = {u E (Z> ) + I ||u||i < 3, |H|oo < 2} , \\u\\i = ^^Uj, \\u\\oo = max-Uj. 

Sei (2010) used this set to analyze multivariate datasets. For each small m, we 
can evaluate the curvature by direct computation. However, the computation needs 
exponential complexity with respect to the dimension m as long as one uses (7) and 
(8). Combinatorial methods may solve the problem. 
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We studied the averaged curvature 7 2 . Instead, one can consider a tensor H uv := 
z Quw,vzJ wz appearing in Lemma 2, which is called the embedding e-curvature 
(Amari (1985)). Although an inequality Huv™^ d is conjectured by numerical 

study, it could not be proved. 

In this paper, we only considered the curvature at the origin = 0. The reason 
that we restrict comes from two different kinds of difficulty One is conceptual diffi- 
culty: the probability densities (and score vectors) of SGM and MixM are different 
except at 9 = 0. An approach may be to consider a local mixture model of SGM 
at each point 9 (Marriott (2002)). The another kind of difficulty is computational 
one. The expression of the Efron curvature at 9 ^ of SGM seems complicated. 
Even the Fisher information matrix J uv is not written in elementary functions in 
general. However, the expression is written at least in terms of integration of multi- 
dimensional rational functions because p(x\9) is a polynomial of 9 U and Zj = e 17TXj . 
Algebraic methods on integration may be helpful. 

6 Proofs 

6.1 Proof of Lemma 3 and Theorem 5 

We calculate the Efron curvature of SGM and MixM step-by-step. 

For SGM, we denote the quantities L uv (x), T uv ^ w , F™ v , Q UVtWZ , 7 2 in Section 2 by 
Lt m) (x), rS } , (TZ) {s8m \ QfeSl ( 7 2 ) (sgm) , respectively. Similarly, for MixM, we 
denote Ll7 x) (x), vt% ] , (r™ )( mix \ Q&z, ( 7 2 ) (mix) . We use L u {x) and J uv without 
superscripts because they are common in both models. Recall that a random matrix 
H u = H u (x) is defined by (4). 

Lemma 8. For any u,v EU, the following equality holds: 

L u (x)=tiH u , Lt^\x) = -ti(H u H v ) L^ ix \x) = -(tiH u )(tiH v ). 
Proof. By (5), the log-likelihood of SGM and MixM are expanded around 9 = as 
\ogp {s ^{x\9) = Y J ^utiH u -^ 0u9 v tr(H u H v ) + O(\\9\\ 3 ), 
\ogp^\x\9) = J2°utrH u -± 0u0 v (trH u )(trH v ) + O(\\9\\ 3 )- 

Then the result follows. □ 

Since the random variables L u (x), Luv U1 \x) and L^ lx \x) are written in terms of 
H u , it is valuable to consider moment formulas of H u . 
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Lemma 9. Let u EU. Let U be Bernoulli randomization of u. Then H u is written 
as H u = E u [e mljTx UU T ]. Furthermore, the random variable x can be replaced with 
a random variable £ uniformly distributed on [—1, l] m , when any moment of tr(H u ) 
and ti(H u H v ) is evaluated. 

Proof. By Euler's formula cos0 = (e 17n?i + e _17r ^)/2, we obtain 

m 

YlcosinujXj) = Eu[j* uTx ]. 

3=1 

Therefore H u = Eu[e 17TllTx UU T ]. Next we consider moments. Consider, for example, 
expectation of tr(H u H v ). The other moments are similarly evaluated. Let (3 be a 
Bernoulli sequence, which is independent of x and any other Bernoulli sequences. 
Put £ = (3 o x. Then £ has the uniform distribution on [—1, l] m , and 

Ez[tr(H u (OH v (0)} = E^yle^ uT ^ vT Hr(UU T VV T )] 
= E^ uy [e^ uT ^ vT ^U T V) 2 ] 
= Kau,v[^ {Uo$)Tx ^ {V ° $)Tx (U t V) 2 ] 

= E x [tr{H u {x)H v {x))\, 

where we put U — U o (3 and V — V o (3, and used an identity U T V = U T V. □ 

From Lemma 9, we simply write H u = Eu[e mUT ^UU T ] below and the expectation 
with respect to x is replaced with the expectation with respect to £. Note that 
E^[e i7raT «] = l {a=0} for any a E Z m . 

Now the Fisher information matrix is evaluated as 

J uv = E^[tTH u tiH v ] 

= IW^+^IMHMI 2 ] 
= ^u,v[l{u+v=o}\\u\\ 2 \\v\\ 2 ] 

= ^pA l {pou=-Pov}]\\ U \\ 2 \M 2 



= E, 



JJ{l{ Ul =i, l= 0} + l{ u . = „. >0i/ 3 i= _ i 8.}} 



1=1 

4 



I l|2|| ||2 

\u\\ \\v\\ 



— 1 r ,9-\ a ( u )\ \\ v 

where (3 and (3 are Bernoulli sequences. This proves Lemma 3. By similar compu- 
tation, we have the following lemma. 
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Lemma 10. Let U, V, S be Bernoulli randomization of u,v, s G U. Then 



(r - )(sgm) = - Euy [l {abs{u+V)=W} (U T V) 2 \\W\\- 2 }, 

(r - )(mix) = _ Et/iF [l {abs(l/+y)=w} || M || 2 ||t;|| 2 ||«;||- 2 ], 
Proof. We first calculate T™™'*. By Lemma 8 and Lemma 9, we have 
r<«p = -Ee[tr(H u H v )trH.] 



-E 



Z,u,v,s 



e^ u+v+s ^^U T V) 2 \\s\\ 2 



= -E[/,y,5 [l{[/+\/+S=0}(^ T ^ / ) 2 ||s|| 2 ] • 

By using the expression of Tuv^ and J sw , we have 

fpw \(sgm) _ \ ^ -n(sgm) jsw 
\ uv) / j uv,s ° 

seu 

= -J2^u,v,s [l {u+v+ s=o}(U T V) 2 \\s\\ 2 } l {s=w} 2^\\s\ 



seu 



= -E u>v , w [l {u+v+w=0} (U T V) 2 \\w\\- 2 2^] 

= —Euyfi [l{ &hs ( U+ v)=w}^{U+V=Pow}(U T V) 2 \\w\\~ 2 2\ a( - W ^] , 
= —Eu,V [l{abs(U+V)=w}(U T V) 2 \\w\\~ 2 ] , 

where f3 is a Bernoulli sequence. The expression of ri™ x) and (r^)( mix ) is obtained 
similarly. □ 

Lemma 11. The curvature tensor of SGM and MixM at 9 = is 

QSSA = Vu,v,w,z [uu(U, V, W, Z)(U T V) 2 (W T Z) 2 } , 
QS^V,^ [a; w (f/,\/,^Z)|| M || 2 ||t;|| 2 ||«;|| 2 ||^|| 2 ], 

respectively, where U, V, W, Z are Bernoulli randomization of u, v, w, z and 

Wu(U, V, W, Z) = l{ U+v+w+z =o : abs(U+V)<£UU{0}}- 

Proof. We only derive the expression of Quv^z- The expression of Q^wz is obtained 
similarly. We first prove 



Rt em \x) := 4T%) + J uv - J2( T ^) {ssm) L s (x) 
— — E[/ y 



l { ,Mu + v)tuu m e miu+v)T HU T V) 2 } . (9) 
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The last term of -RiT^ (x) is 



E( r ™) (SSm)L ^ = l {abs{ u + v)= s} e^ sT HU T V) 2 



seu 



E Eu ' v >P 



seu 



-{abs(t/+V)=s} e 



i7r(/3o(l/+y)) T ^ f/ Ty^2 



E TO [l { abs(c/^ )eW} e-^^) Tf (f/ T V) 2 " 



E 



4abs(f/+V)GW} e 



where f3 is a Bernoulli sequence. For the first and second term of Ruv m \x), we have 



Wsgm) = _ E 



uy 



>(W) T (((/Ty)2 



J uv = E uy [1 {U+V=0} (U T V) 2 ] = E uy [l {ahs{u+ v)=o}e in{u+v)T HU T V) 2 
Hence (9) is obtained. Now the tensor Quv$z is calculated as follows: 

/O(sgm) _ p i fn(sgm) E>(sgm)l 
^Cuv,wz LJ t,\. ±l, uv ±L wz J 



E. 



i,u,v,w,z 



l{abs(C/+V)^U{0},abs(U'+Z)^WU{0}} el7r(f/+y+W/+Z) 5 (U T V) 2 (W T Z) 2 



— ^U,V,W,Z [^{U+V+W+Z=0,abs(U+V)(£UU{0}}(U T V) 2 (W T Z) 2 ~\ 

Therefore we obtain the desired expression. 



□ 



We finally prove Theorem 5. Since the Fisher information matrix is diagonal, we 
have 



(7 2 ) {sgm) = E QuvZl JUWJV 

u,v,w,z£U 

E/o(sgm) juu jvv 
^uv,uv J J 

u,v£U 



E E u,v,u,v \MU,V,U,V)(U t V) 2 (U t VY 



u,v£U 



2\<t(u)\ + \*(v)\ 



MHMI 4 



Thus (7) is proved. (8) is shown similarly. 



6.2 Proof of Theorem 4 



We prove Theorem 4 by using the explicit expression (7) and (8) of the Efron 
curvature. We abbreviate 7^ as r ) 2 . 

We prove the first inequality in (6). By the expression (7), it is sufficient to 
show that u>u(u,u, —u, —u) = 1 for some u G U. Let u be an element such that 
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||w||i = max ve u \\v\\i. Then we have u + u — u — u = and m + m^WU {0}, and 
hence u>u(u, u, —u, —u) = 1. 

The second inequality in (6) follows from equations (7), (8), and 



{u T vf{u T vf < ||£/|| 2 ||vi 2 ||^|| 2 ||\/|| 2 = |M| 4 |M 



4 



We now consider the equality condition. First assume U C Zj. Then (U T V) 2 (U T V) 2 
in (7) is equal to (-Ujfj) 2 (wjfj) 2 , which is equal to |H| 4 ||f || 4 . Therefore (7 2 )( s s m ) = 
(7 2 )( mix ). Conversely, assume (7 2 )( s § m ) = (-y 2 )( mix ). Since U is a non-empty finite 
subset, there exist some u & U and some i e {1, . . . , m} such that 

ttj > and Ui > u>i (Viu G W). 

Fix such m and i. We show u e Zj. Define an integer vector u e Z m by -Uj = tij 
and -u-,- = —Uj for j 7^ i. Since |tij + «j| = 2-Uj > we have abs(w + u) U U {0} 
and therefore ojy(u,u, —u, —u) = 1. Let {Efyfe)}fc=i be four independent Bernoulli 
randomization of u. Note that each U^) takes u (resp. w) with probability at least 
2~ m . We evaluate 



= ( 7 2 )( mix ) - ( 7 2 )0e m ) 



> E U (1) ,U (2) ,U( 3) ,U (4) 



Uu(U(l), U(2), C/( 3 ), C/( 4 )) 



^ (^(2)) 2 (^(4)) S 



> 2- 4 "W(«,u, -«) ( 1 - ^rw~ ) > °- 



This implies |u T ti| = ||tt|| 2 . By equality condition of the Cauchy-Schwarz inequality, 
there is a real number p such that u = pu. This implies u e Zj. Now, by contradic- 
tion, assume that there exists some t> e W \ Zj. We further assume i>j > Wj for any 
u> e W \ Zj without loss of generality. Since Mj + t> j > t> j and u + t> ^ Zj, we deduce 
it + f £U U {0}. Hence Co^(w, i>, — m, — 1>) = 1. Then we have 

= (7 2 ) (mix) - (7 2 ) (ssm) > 2'%( V) -n,-^l-^j > 0. 

This implies \u T v\ = \\u\\ \\v\\. By equality condition of the Cauchy-Schwarz inequal- 
ity, there is a real number p such that v = pu. This implies v e Zj and contradict 
the definition of v. Thus we have U C Zj. 

6.3 Proof of Theorem 6 and Corollary 7 

We first prove Theorem 6. Put d = max m max ue ^ m < 00. We abbreviate U m 

by U below. It is sufficient to prove that (7^) {sgm) < \N{U) \ and (7^) {mix) > c\M(U)\ 
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with a positive constant c. If (u,v) £ N(U), then U T V = in (7). Hence 



2 \ (sgm) 



— ^2 ^U,V,U,V 
(u,v)eN(U) 



~ ~ (U T V) 2 (U T V) 2 



MI 4 IMI 4 



< \N(U)\. 



We next evaluate (8). If (u,v) G M(U), then u u (u,v, —u, —v) = 1. Since u has at 
most d non-zero elements, the event U = u happens with probability at least 2~ d , 
where U is a Bernoulli randomization of u. Therefore 



(7. 



2 \ (mix) 
U! 



> E E t 

(u,v)eM(U) 



u u (U,V,U,V) >2- 4d \M(U)\. 



This proves Theorem 6. 

Next we prove Corollary 7. Assume \N(U)\/\n(U)\ 2 ->■ 0. Note that \fi(U)\ ->■ oo 
since |7V(W)| > |W| > 1. From the definition of M(U) and/i(W), the set {(u,u) G U 2 \ 
u,v e n(U),u^v} is a subset of M(W). Then we have |M(W)| > |>u(ZY) | (| yLt(ZY) j - 1). 
Thus 



|iV(W)| 



< 



|iV(W)| 



im(w)| - hu)\ 2 (i - \m\- v 

and the proof is completed. 



->■ 
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